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Internet Delay Experiments 


This memo reports on some measurement experiments and suggests some possible 
improvements to the TCP retransmission timeout calculation. This memo is 
both a status report on the measurements and advice to implementers of TCP. 


1. Introduction 


This memorandum describes two series of experiments designed to explore 
the transmission characteristics of the Internet system. One series of 
experiments was designed to determine the network delays with respect to 
packet length, while the other was designed to assess the effectiveness of the 
TCP retransmission-timeout algorithm specified in the standards documents. 
Both sets of experiments were conducted during the October - November 1983 
time frame and used many hosts distributed throughout the Internet system. 


The objectives of these experiments were first to accumulate experimental 
data on actual network paths that could be used as a benchmark of Internet 
system performance, and second to apply these data to refine individual TCP 
implementations and improve their performance. 


The experiments were done using a specially instrumented measurement host 
called a Fuzzball, which consists of an LSI-11 running IP/TCP and various 
application-layer protocols including TELNET, FTP and SMTP mail. Among the 
various measurement packages is the original PING (Packet InterNet Groper) 
program used over the last six years for numerous tests and measurements of 
the Internet system and its client nets. This program contains facilities to 
send various kinds of probe packets, including ICMP Echo messages, process the 
reply and record elapsed times and other information in a data file, as well 
as produce real-time snapshot histograms and traces. 


Following an experiment run, the data collected in the file were reduced 
by another set of programs and plotted on a Peritek bit-map display with color 
monitor. The plots have been found invaluable in the indentification and 
understanding of the causes of netword glitches and other "zoo" phenomena. 
Finally, summary data were extracted and presented in this memorandum. The 
raw data files, including bit-map image files of the various plots, are 
available to other experimenters upon request. 


The Fuzzballs and their local-net architecture, called DCN, have about 
two-dozen clones scattered worldwide, including one (DCN1) at the Linkabit 
Corporation offices in McLean, Virginia, and another at the Norwegian 
Telecommunications Adminstration (NTA) near Oslo, Norway. The DCN1 Fuzzball 
is connected to the ARPANET at the Mitre IMP by means of 1822 Error Control 
Units operating over a 56-Kbps line. The NTA Fuzzball is connected to the 
NTARE Gateway by an 1822 interface and then via VDH/HAP operating over a 
9.6-Kbps line to SATNET at the Tanum (Sweden) SIMP. For most experiments 
described below, these details of the local connectivity can be ignored, since 
only relatively small delays are involved. 
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The remote test hosts were selected to represent canonical paths in the 
Internet system and were scattered all over the world. They included some on 
the ARPANET, MILNET, MINET, SATNET, TELENET and numerous local nets reachable 
via these long-haul nets. As an example of the richness of the Internet 
system connectivity and the experimental data base, data are included for 
three different paths from the ARPANET-based measurement host to London hosts, 
two via different satellite links and one via an undersea cable. 


2. Packet Length Versus Delay 


This set of experiments was designed to determine whether delays across 
the Internet are significantly influenced by packet length. In cases where 
the intrinsic propagation delays are high relative to the time to transmit an 
individual packet, one would expect that delays would not be strongly affected 
by packet length. This is the case with satellite nets, including SATNET and 
WIDEBAND, but also with terrestrial nets where the degree of traffic 
aggregation is high, so that the measured traffic is a small proportion of the 
total traffic on the path. However, in cases where the intrinsic propagation 
delays are low and the measured traffic represents the bulk of the traffic on 
the path, quite the opposite would be expected. 


The objective of the experiments was to assess the degree to which TCP 
performance could be improved by refining the retransmission-timeout algorithm 
to include a dependency on packet length. Another objective was to determine 
the nature of the delay characteristic versus packet length on tandem paths 
spanning networks of widely varying architectures, including local-nets, 
terrestrial long-haul nets and satellite nets. 


2.1. Experiment Design 


There were two sets of experiments to measure delays as a function of 
packet length. One of these was based at DCN1, while the other was based at 
NTA. All experiments used ICMP Echo/Reply messages with embedded timestamps. 
A cycle consisted of sending an ICMP Echo message of specified length, waiting 
for the corresponding ICMP Reply message to come back and recording the 
elapsed time (normalized to one-way delay). An experiment run, resulting in 
one line of the table below, consisted of 512 of these volleys. 


The length of each ICMP message was determined by a random-number 
generator uniformly distributed between zero and 256. Lengths less than 40 
were rounded up to 40, which is the minimum datagram size for an ICMP message 
containing timestamps and just happens to also be the minimum TCP segment 
size. The maximum length was chosen to avoid complications due to 
fragmentation and reassembly, since ICMP messages are not ordinarily 
fragmented or reassembled by the gateways. 


The data collected were first plotted as a scatter diagram on a color 
bit-map display. For all paths involving the ARPANET, this immediately 
revealed two distinct characteristics, one for short (single-packet) messages 
less than 126 octets in length and the other for long (multi-packet) messages 
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longer than this. Linear regression lines were then fitted to each 
characteristic with the results shown in the following table. (Only one 
characteristic was assumed for ARPANET-exclusive paths.) The table shows for 
each host the delays, in milliseconds, for each type of message along with a 
rate computed on the basis of these delays. The "Host ID" column designates 
the host at the remote end of the path, with a letter suffix used when 
necessary to identify a particular run. 
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Host Single-packet Rate Multi-packet Rate Comments 

ID 40 125 (bps) 125 256 (bps) 

DCN1 to nearby local-net hosts (calibration) 

DCN5 9 13 366422 DMA 1822 

DCN8 14 20 268017 Ethernet 

IMP17 22 60 45228 56K 1822/ECU 
FORD1 93 274 9540 9600 DDCMP base 
UMD1 102 473 4663 4800 synch 

DCN6 188 550 4782 4800 DDCMP 

FACC 243 770 3282 9600/4800 DDCMP 
FOE 608 1917 1320 9600/14.4K stat mux 


DCN1 to ARPANET hosts and local nets 


MILARP 61 105 15358 13.3 171 27769 MILNET gateway 
ISID-L 166 263 6989 403 472 15029 low-traffic period 
SCORE 184 318 5088 541 608 15745 low-traffic period 
RVAX 231 398 4061 651 740 11781 Purdue local net 
AJAX 322 578 2664 944 1081 7681 MIT local net 
ISID-H 333 520 3643 715 889 6029 high-traffic period 
BERK 336 967 1078 1188 1403 4879 UC Berkeley 

WASH 498 776 2441 1256 1348 11379 U Washington 

DCN1 to MILNET/MINET hosts and local nets 

ISIA-L 460 563 6633 1049 1140 11489 low-traffic period 
ISIA-H 564 841 2447 1275 1635 2910 high-traffic period 
BRL 560 973 1645 1605 1825 4768 BRL local net 

LON 585 835 2724 1775 1998 4696 MINET host (London) 
HAWAII 679 980 2257 1817 1931: 9238 a long way off 
OFFICE3 762 1249 1396 2283 2414 8004 heavily loaded host 
KOREA 897 1294 1712 og bikie) 2770 19652 a long, long way off 
DCN1 to TELENET hosts via ARPANET 

RICE 1456 2358 754 3086 3543 2297 via VAN gateway 
DCN1 to SATNET hosts and local nets via ARPANET 

UCL 1089 1240 4514 1426 1548 8558 UCL zoo 

NTA-L 1132 1417 2382 1524 1838 3339 low-traffic period 
NTA-H 1247 1504 2640 1681 1811 8078 high-traffic period 
NTA to SATNET hosts 

TANUM 107 368 6625 9600 bps Tanum line 
ETAM 964 1274 5576 Etam channel echo 


GOONY 972 1256 6082 Goonhilly channel echo 
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2.2 Analysis of Results 


The data clearly show a strong correlation between delay and length, with 
the longest packets showing delays two to three times the shortest. On paths 
via ARPANET clones the delay characteristic shows a stonger correlation with 
length for single-packet messages than for multi-packet messages, which is 
consistent with a design which favors low delays for short messages and high 
throughputs for longer ones. 


Most of the runs were made during off-peak hours. In the few cases where 
runs were made for a particular host during both on-peak and off-peak hours, 
comparison shows a greater dependency on packet length than on traffic shift. 


TCP implementors should be advised that some dependency on packet length 
may have to be built into the retransmission-timeout estimation algorithm to 
insure good performance over lossy nets like SATNET. They should also be 
advised that some Internet paths may require stupendous timeout intervals 
ranging to many seconds for the net alone, not to mention additional delays on 
host-system queues. 


I call to your attention the fact that the delays (at least for the 
larger packets) from ARPANET hosts (e.g. DCN1) to MILNET hosts (e.g. ISIA) 


are in the same ballpark as the delays to SATNET hosts (e.g. UCL)! I have 
also observed that the packet-loss rates on the MILNET path are at present not 
neglible (18 in 512 for ISIA-2). Presumably, the loss is in the gateways; 


however, there may well be a host or two out there swamping the gateways with 
retransmitted data and which have a funny idea of the "normal" timeout 
interval. The recent discovery of a bug in the TOPS-20 TCP implementation, 
where spurious ACKs were generated at an alarming rate, would seem to confirm 
that suspicion. 


3. Retransmission-Timeout Algorithm 


One of the basic features of TCP which allow it to be used on paths 
spanning many nets of widely varying delay and packet-loss characteristics is 
the retranansmission-timeout algorithm, sometimes known as the "RSRE 
Algorithm" for the original designers. The algorithm operates by recording 
the time and initial sequence number when a segment is transmitted, then 
computing the elapsed time for that sequence number to be acknowledged. There 
are various degrees of sophistication in the implementation of the algorithm, 
ranging from allowing only one such computation to be in progress at a time to 
allowing one for each segment outstanding at a time on the connection. 


The retransmission-timeout algorithm is basically an estimation process. 
It maintains an extimate of the current roundtrip delay time and updates it as 
new delay samples are computed. The algorithm smooths these samples and then 
establishes a timeout, which if exceeded causes a retransmission. The 
selection of the parameters of this algorithm are vitally important in order 
to provide effective data transmission and avoid abuse of the Internet system 
by excessive retransmissions. I have long been suspicious of the parameters 


Internet Delay Experiments Page 6 
D.L. Mills 


suggested in the specification and used in some implementations, especially in 
cases involving long-delay paths involving lossy nets. The experiment was 
designed to simulate the operation of the algorithm using data collected from 
real paths involving some pretty leaky Internet plumbing. 


3.1. Experiment Design 


The experiment data base was constructed of well over a hundred runs 
using ICMP Echo/Reply messages bounced off hosts scattered all over the world. 
Most runs, including all those summarized here, consisted of 512 echo/reply 
cycles lasting from several seconds to twenty minutes or so. Other runs 
designed to detect network glitches lasted several hours. Some runs used 
packets of constant length, while others used different lengths distributed 
from 40 to 256 octets. The maximum length was chosen to avoid complications 
fragmented or reassembled by the gateways. 


The object of the experiment was to simulate the packet delay 
distribution seen by TCP over the paths measured. Only the network delay is 
of interest here, not the queueing delays within the hosts themselves, which 
can be considerable. Also, only a single packet was allowed in flight, so 
that stress on the network itself was minimal. Some tests were conducted 
during busy periods of network activity, while others were conducted during 
quiet hours. 


The 512 data points collected during each run were processed by a program 
which plotted on a color bit-map display each data point (x,y), where x 
represents the time since initiation of the experiment the and y the measured 
delay, normalized to the one-way delay. Then, the simulated 
retransmission-timeout algorithm was run on these data and its computed 
timeout plotted in the same way. The display immediately reveals how the 
algorithm behaves in the face of varying traffic loads, network glitches, lost 
packets and superfluous retransmissions. 


Each experiment run also produced summary statistics, which are 
summarized in the table below. Each line includes the Host ID, which 
identifies the run. The suffix -1 indicates 40-octet packets, -2 indicates 
256-octet packets and no suffix indicates uniformly distributed lengths 
between 40 and 256. The Lost Packets columns refer to instances when no ICMP 
Reply message was received for thirty seconds after transmission of the ICMP 
Echo message, indicating probable loss of one or both messages. The RTX 
Packets columns refer to instances when the computed timeout is less than the 
measured delay, which would result in a superfluous retransmission. For each 
of these two types of packets the column indicates the number of instances 
and the Time column indicates the total accumulated time required for the 
recovery action. 


For reference purposes, the Mean column indicates the computed mean delay 
of the echo/reply cycles, excluding those cycles involving packet loss, while 
the CoV column indicates the coefficient of variation. Finally, the Eff 
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column indicates the efficiency, computed as the ratio of the total time 
accumulated while sending good data to this time plus the lost-packet and 
rtx-packet time. 


Complete sets of runs were made for each of the hosts in the table below 
for each of several selections of algorithm parameters. The table itself 
reflects values, selected as described later, believed to be a good compromise 
for use on existing paths in the Internet system. 


Internet Delay Experiments 


D.L. Mills 

Host Total Lost Packets RTX Packets 
ID Time Time Time 
DCN1 to nearby local-net hosts (calibration) 
DCN5 5 0 0 (0) (0) 
DCN8 8 0 0 (0) 0) 
IMP17 19 0 0 0 0 
FORD1 86 0 0 1 2 
UMD 1 135 0 0 2 .5 
DCN6 177 0 (0 0 0 
FACC 368 196 2222511 6 Di. 
FOE 670 3 T9 21 TITI 
FOE-1 374 (0) 0 26 61.9 
FOE-2 1016 3 16.7 10 47.2 
DCN1 to ARPANET hosts and local nets 

MILARP 59 (0) 0 2 .5 
ISID 163 (0) 0 1 1.8 
ISID-1 84 0 0 2 I 
ISID-2 281 0) 0 3 17 
ISID * 329 0 0 5 12.9 
SCORE 208 0 0 1 8 
RVAX 256 1 1.3 0 0 
AJAX 365 (0 0 0 0 
WASH 494 0 (0 2 2.8 
WASH-1 271 0 0 5 8 
WASH-2 749 1 9.8 2 1745 
BERK 528 20 501 4 25 


DCN1 to MILNET/MINET hosts and local nets 
ISIA 436 4 7.4 2 15.7 


ISIA-1 197 0 0 0 0 
ISIA-2 615 0 0 2 I5 
ISIA * 595 18 54.1 6 33.3 
BRL 644 1 3 1 19 
BRL-1 318 0 0 4 13.6 
BRL-2 962 2 8.4 0 0 
LON 677 0 0 3 11.7 
LON-1 302 0 0 0 0 
LON-2 1047 0 0 0 0 
HAWAII 709 4 12.9 3 18.5 
OFFICE3 856 3 12.9 3 10.3 
OFF3-1 432 2 4.2 2 6.9 
OFF3-2 1277 7 39 3 41.5 
KOREA 1048 3 14.5 2 18.7 
KOREA-1 506 4 8.6 1 2.2 
KOREA-2 1493 6 35.5 2 19.3 
DCN1 to TELENET hosts via ARPANET 


RICE 677 2 6.8 3 12.1 


1286 


41 


97 


Page 8 


Internet Delay Experiments 


SATNET hosts and local nets via ARPANET 


D.L. Mills 
RICE-1 368 
RICE-2 1002 
DCN1 to 

UCL 689 
UCL-1 623 
UCL-2 818 
NTA 779 
NTA-1 616 
NTA-2 971 


NTA to SATNET hosts and 


TANUM 
GOONY 
ETAM 
UCL 


Note: 


activity. 


110 
587 
608 
612 


1 
1 


9 
39 
4 
12 
24 
19 


3 
19 
32 
5 


ct 3 
4.4 1 


26.8 0 
92.8 2 
T345 0 
38.7 1 
56.6 2 
71.1 0 


1.6 0 
44.2 1 
76.3 I 
12.6 2 


2 
9. 


0 


oOuUwWOU 


OWNO 


3 
5 


23 


aw 


9 
aL 
z9 


715 
1930 


1294 
1025 
1571 
1438 
1083 
1757 


213 

1056 
1032 
1154 


LI 


AL 
v23 
x29 
.24 


w99 
.98 


.96 
.84 
.98 
s94 
89 
92 


298 
9L 
.386 
.96 
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* indicates randomly distributed packets during periods of high ARPANET 
The same entry without the * indicates randomly distributed packets 
during periods of low ARPANET activity. 
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3.2 Discussion of Results 


It is immediately obvious from visual inspection of the bit-map display 
that the delay distribution is more-or-less Poissonly distributed about a 
relatively narrow range with important exceptions. The exceptions are 
characterized by occasional spasms where one or more packets can be delayed 
many times the typical value. Such glitches have been commonly noted before 
on paths involving ARPANET and SATNET, but the true impact of their occurance 
on the timeout algorithm is much greater than I expected. What commonly 
happens is that the algorithm, when confronted with a short burst of 
long-delay packets after a relatively long interval of well-mannered behavior, 
takes much too long to adapt to the spasm, thus inviting many superfluous 
retransmissions and leading to congestion. 


The incidence of long-delay bursts, or glitches, varied widely during the 
experiments. Some of them were glitch-free, but most had at least one glitch 
in 512 echo/reply volleys. Glitches did not seem to correlate well with 
increases in baseline delay, which occurs as the result of traffic surges, nor 
did they correlate well with instances of packet loss. I did not notice any 
particular periodicity, such as might be expected with regular pinging, for 
example; however, I did not process the data specially for that. 


There was no correction for packet length used in any of these 
experiments, in spite of the results of the first set of experiments described 
previously. This may be done in a future set of experiments. The algorithm 
does cope well in the case of constant-length packets and in the case of 
randomly distributed packet lengths between 40 and 256 octets, as indicated in 
the table. Future experiments may involve bursts of short packets followed by 
bursts of longer ones, so that the speed of adaptation of the algorithm can be 
directly deterimend. 


One particularily interesting experiment involved the FOE host 
(FORD-FOE), which is located in London and reached via a 14.4-Kbps undersea 
cable and statistical multiplexor. The multiplexor introduces a moderate mean 
delay, but with an extremely large delay dispersion. The specified 
retransmission-timeout algorithm had a hard time with this circuit, as might 
be expected; however, with the improvments described below, TCP performance 
was acceptable. It is unlikely that many instances of such ornery circuits 
will occur in the Internet system, but it is comforting to know that the 
algorithm can deal effectively with them. 


3.3. Improvments to the Algorithm 


The specified retransmission-timeout algorithm, really a first-order 
linear recursive filter, is characterized by two parameters, a weighting 
factor F and a threshold factor G. For each measured delay sample R the delay 


= 


estimator E is updated: 
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= 


Then, if an interval equal to G*E expires after transmitting a packet, the 
packet is retransmitted. The current TCP specification suggests values in the 
range 0.8 to 0.9 for F and 1.5 to 2.0 for G. These values have been believed 
reasonable up to now over ARPANET and SATNET paths. 


I found that a simple change to the algorithm made a worthwhile change in 
the efficiency. The change amounts to using two values of F, one (F1) when R 
< E in the expression above and the other (F2) when R >= E, with Fl > F2. The 
effect is to make the algorithm more responsive to upward-going trends in 
delay and less respnsive to downward-going trends. After a number of trials I 
concluded that values of Fl = 15/16 and F2 = 3/4 (with G = 2) gave the best 


all-around performance. The results on some paths (FOE, ISID, ISIA) were 
better by some ten percent in efficiency, as compared to the values now used 
in typical implementations where F = 7/8 and G = 2. The results on most paths 


were better by five percent, while on a couple (FACC, UCL) the results were 
worse by a few percent. 


There was no clear-cut gain in fiddling with G. The value G = 2 seemed 
to represent the best overall compromise. Note that increasing G makes 
superfluous retransmissions less likely, but increases the total delay when 
packets are lost. Also, note that increasing F2 too much tends to cause 
overshoot in the case of network glitches and leads to the same result. The 
table above was constructed using Fl = 15/16, F2 = 3/4 and G = 2. 


Readers familiar with signal-detection theory will recognize my 
suggestion as analogous to an ordinary peak-detector circuit. Fl represents 
the discharge time-constant, while F2 represents the charge time-constant. G 
represents a "squelch" threshold, as used in voice-operated switches, for 
example. Some wag may be even go on to suggest a network glitch should be 
called a netspurt. 
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Appendix. Index of Test 


Name Address 


DCN1 to nearby local-net 


DCN5 128.4.0.5 
DCN8 128.4.0.8 
IMP17 10% 3017 
FORD1 T2850 
UMD1 128.8.0.1 
DCN6 128.4.0.6 
FACC 128.5.32.1 
FOE 1283540-15 


DCN1 to ARPANET hosts an 
MILARP 10.2.0.28 


ISID 10.0.0.27 
SCORE 10.30.11 
RVAX 128.10.0.2 
AJAX 18.10.0.64 
WASH 10.0.0.91 
BERK 10.2.0.78 


DCN1 to MILNET/MINET hos 


ISIA 26.3.0.103 
BRL 192.5.21.6 
LON 24.0.0.7 


HAWAII 26.1.0.36 
OFFICE3 26.2.0.43 
KOREA 26.0.0.117 


DCN1 to TELENET hosts vi 
RICE 1:4030 212 


UCL 128.16.9.0 
NTA 128.39.0.2 


NTA to SATNET hosts and 
TANUM 4.0.0.64 
GOONY 4.0.0.63 
ETAM 4.0.0.62 


Hosts 


NIC Host Name 


DCN-GATEWAY 
FORD1 

UMD 1 

DCN6 
FORD-WDL1 
FORD-FOE 


d local nets 
ARPA-MILNET-GW 
USC-ISID 
SU-SCORE 
PURDUE-MORDRED 
MIT-AJAX 
WASHINGTON 
UCB-VAX 


ts and local nets 
USC-ISIA 

BRL-VGR 
MINET-LON-EM 
HAWAII-EMH 
OFFICE-3 
KOREA-EMH 


a ARPANET 
RICE 


DCN1 to SATNET hosts and local nets via ARPANET 


UCL-SAM 
NTARE1 


local nets 
TANUM-ECHO 
GOONHILLY-ECHO 
ETAM-ECHO 
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